Search CORE

12 research outputs found

Privacy-Preserving Release of Spatio-temporal Density

Author: A E Cicek
A Monreale
A Rajaraman
Ashwin Machanavajjhala
Benjamin C. M. Fung
C Li
Cynthia Dwork
Cynthia Dwork
G Kellaris
G Poulis
Georgios Kellaris
I Goodfellow
LATANYA SWEENEY
Linus Bengtsson
M E Nergiz
Manolis Terrovitis
Marta C. González
N Victor
P Neirotti
R Kitchin
Shubha U. Nabar
W Qardaji
X He
Y Xiao
Publication venue: Springer International Publishing
Publication date: 01/01/2018
Field of study

International audienceIn today’s digital society, increasing amounts of contextually rich spatio-temporal information are collected and used, e.g., for knowledge-based decision making, research purposes, optimizing operational phases of city management, planning infrastructure networks, or developing timetables for public transportation with an increasingly autonomous vehicle fleet. At the same time, however, publishing or sharing spatio-temporal data, even in aggregated form, is not always viable owing to the danger of violating individuals’ privacy, along with the related legal and ethical repercussions. In this chapter, we review some fundamental approaches for anonymizing and releasing spatio-temporal density, i.e., the number of individuals visiting a given set of locations as a function of time. These approaches follow different privacy models providing different privacy guarantees as well as accuracy of the released anonymized data. We demonstrate some sanitization (anonymization) techniques with provable privacy guarantees by releasing the spatio-temporal density of Paris, in France. We conclude that, in order to achieve meaningful accuracy, the sanitization process has to be carefully customized to the application and public characteristics of the spatio-temporal data

Crossref

INRIA a CCSD electronic archive server

Repository of the Academy's Library

Scalable exploration of physical database design

Author: Arnd Christian König
Shubha U. Nabar
Publication venue
Publication date: 01/01/2006
Field of study

Physical database design is critical to the performance of a large-scale DBMS. The corresponding automated design tuning tools need to select the best physical design from a large set of candidate designs quickly. However, for large workloads, evaluating the cost of each query in the workload for every candidate does not scale. To overcome this, we present a novel comparison primitive that only evaluates a fraction of the workload and provides an accurate estimate of the likelihood of selecting correctly. We show how to use this primitive to construct accurate and scalable selection procedures. Furthermore, we address the issue of ensuring that the estimates are conservative, even for highly skewed cost distributions. The proposed techniques are evaluated through a prototype implementation inside a commercial physical design tool

CiteSeerX

Crossref

Auditing a batch of SQL queries

Author: Dilys Thomas
Rajeev Motwani
Shubha U. Nabar
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, we study the problem of auditing a batch of SQL queries: given a set of SQL queries that have been posed over a database, determine whether some subset of these queries have revealed private information about an individual or group of individuals. In [2], the authors studied the problem of determining whether any single SQL query in isolation revealed information forbidden by the database system’s data disclosure policies. In this paper, we extend this work to the problem of auditing a batch of SQL queries. We define two different notions of auditing- semantic auditing and syntactic auditing- and show that while syntactic auditing seems more desirable, it is in fact NP-hard to achieve. The problem of semantic auditing of a batch of SQL queries is, however, tractable and we give a polynomial time algorithm for this purpose

CiteSeerX

Crossref

Link Privacy in Social Networks

Author: Aleksandra Korolova
Rajeev Motwani
Shubha U. Nabar
Ying Xu
Publication venue
Publication date: 01/01/2008
Field of study

We consider a privacy threat to a social network in which the goal of an attacker is to obtain knowledge of a significant fraction of the links in the network. We formalize the typical social network interface and the information about links that it provides to its users in terms of lookahead. We consider a particular threat where an attacker subverts user accounts to get information about local neighborhoods in the network and pieces them together in order to get a global picture. We analyze, both experimentally and theoretically, the number of user accounts an attacker would need to subvert for a successful attack, as a function of his strategy for choosing users whose accounts to subvert and a function of lookahead provided by the network. We conclude that such an attack is feasible in practice, and thus any social network that wishes to protect the link privacy of its users should take great care in choosing the lookahead of its interface, limiting it to 1 or 2, whenever possible

CiteSeerX

Crossref

Towards Special-Purpose Indexes and Statistics for Uncertain Data

Author: Anish Das Sarma
Jennifer Widom
Parag Agrawal
Shubha U. Nabar
Publication venue
Publication date: 01/01/2008
Field of study

Abstract. data, uncertainty, and lineage is developed on top of a conventional DBMS. Uncertain data with lineage is encoded in relational tables, and Trio queries are translated to SQL queries on the encoding. Such a layered approach reaps significant benefits in terms of architectural simplicity, and the ability to use an off-the-shelf query processing engine. In this paper, we present special-purpose indexes and statistics that complement the layered approach to further enhance its performance. First, we identify a well-defined structure of Trio queries, relations, and their encoding that can be exploited by the underlying query optimizer to improve the performance using Trio’s layered approach. We propose several mechanisms for indexing Trio’s uncertain relations and study when these indexes are useful. We then present an interesting order, and an associated operator, which are especially useful to consider when composing query plans. The decision of which query plan to use for a Trio query is dictated by various statistical properties of the input data. We identify the statistical data that can guide the underlying optimizer, and design histograms that enable estimating the statistics accurately

CiteSeerX